-
Notifications
You must be signed in to change notification settings - Fork 27.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache results of is_torch_tpu_available() #18777
Conversation
The documentation is not available anymore as the PR was closed or merged. |
Not sure the reason of CI failure. It seems not relevant to this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds fair to me, but I'll let @sgugger, mastermind of the Trainer, review when he's back from leave at the end of the week!
@LysandreJik thanks for the review and sure we could wait for @sgugger. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for your PR! Tests failures seem unrelated, I have tried re-runing them. Could you you rebase on main if the failure persist?
Thanks for bearing with us. The test failures are spurious and unrelated, so we can merge this. |
* Cache results of is_torch_tpu_available() * Update src/transformers/utils/import_utils.py * Update src/transformers/utils/import_utils.py
What does this PR do?
xm.xla_device()
(called byis_torch_tpu_available()
) hangs when calling multiple times but no XLA devices are available, and this results in Trainer hanging. Since currentlytorch_xla
will be used as long as it is installed in the current active Python environment, I encountered this issue even when I only want to run the Trainer with PyTorch on GPU.The detail reason behind
torch_xla
is still under investigation (see pytorch/xla#3939).To workaround this issue, this PR adds
lru_cache
tois_torch_tpu_available()
, so thatxm.xla_device()
is guaranteed to be called only once when no XLA device is available.Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@muellerzr @sgugger